视频框架插值〜(VFI)算法近年来由于数据驱动算法及其实现的前所未有的进展,近年来有了显着改善。最近的研究引入了高级运动估计或新颖的扭曲方法,以解决具有挑战性的VFI方案。但是,没有发表的VFI作品认为插值误差(IE)的空间不均匀特征。这项工作引入了这样的解决方案。通过密切检查光流与IE之间的相关性,本文提出了新的错误预测指标,该指标将中间框架分为与不同IE水平相对应的不同区域。它基于IE驱动的分割,并通过使用新颖的错误控制损耗函数,引入了一组空间自适应插值单元的合奏,该单元逐步处理并集成了分段区域。这种空间合奏会产生有效且具有诱人的VFI解决方案。对流行视频插值基准测试的广泛实验表明,所提出的解决方案在当前兴趣的应用中优于当前最新(SOTA)。
translated by 谷歌翻译
Real-world robotic grasping can be done robustly if a complete 3D Point Cloud Data (PCD) of an object is available. However, in practice, PCDs are often incomplete when objects are viewed from few and sparse viewpoints before the grasping action, leading to the generation of wrong or inaccurate grasp poses. We propose a novel grasping strategy, named 3DSGrasp, that predicts the missing geometry from the partial PCD to produce reliable grasp poses. Our proposed PCD completion network is a Transformer-based encoder-decoder network with an Offset-Attention layer. Our network is inherently invariant to the object pose and point's permutation, which generates PCDs that are geometrically consistent and completed properly. Experiments on a wide range of partial PCD show that 3DSGrasp outperforms the best state-of-the-art method on PCD completion tasks and largely improves the grasping success rate in real-world scenarios. The code and dataset will be made available upon acceptance.
translated by 谷歌翻译
Research on automated essay scoring has become increasing important because it serves as a method for evaluating students' written-responses at scale. Scalable methods for scoring written responses are needed as students migrate to online learning environments resulting in the need to evaluate large numbers of written-response assessments. The purpose of this study is to describe and evaluate three active learning methods than can be used to minimize the number of essays that must be scored by human raters while still providing the data needed to train a modern automated essay scoring system. The three active learning methods are the uncertainty-based, the topological-based, and the hybrid method. These three methods were used to select essays included as part of the Automated Student Assessment Prize competition that were then classified using a scoring model that was training with the bidirectional encoder representations from transformer language model. All three active learning methods produced strong results, with the topological-based method producing the most efficient classification. Growth rate accuracy was also evaluated. The active learning methods produced different levels of efficiency under different sample size allocations but, overall, all three methods were highly efficient and produced classifications that were similar to one another.
translated by 谷歌翻译
Based on WHO statistics, many individuals are suffering from visual problems, and their number is increasing yearly. One of the most critical needs they have is the ability to navigate safely, which is why researchers are trying to create and improve various navigation systems. This paper provides a navigation concept based on the visual slam and Yolo concepts using monocular cameras. Using the ORB-SLAM algorithm, our concept creates a map from a predefined route that a blind person most uses. Since visually impaired people are curious about their environment and, of course, to guide them properly, obstacle detection has been added to the system. As mentioned earlier, safe navigation is vital for visually impaired people, so our concept has a path-following part. This part consists of three steps: obstacle distance estimation, path deviation detection, and next-step prediction, done by monocular cameras.
translated by 谷歌翻译
The classification of sleep stages plays a crucial role in understanding and diagnosing sleep pathophysiology. Sleep stage scoring relies heavily on visual inspection by an expert that is time consuming and subjective procedure. Recently, deep learning neural network approaches have been leveraged to develop a generalized automated sleep staging and account for shifts in distributions that may be caused by inherent inter/intra-subject variability, heterogeneity across datasets, and different recording environments. However, these networks ignore the connections among brain regions, and disregard the sequential connections between temporally adjacent sleep epochs. To address these issues, this work proposes an adaptive product graph learning-based graph convolutional network, named ProductGraphSleepNet, for learning joint spatio-temporal graphs along with a bidirectional gated recurrent unit and a modified graph attention network to capture the attentive dynamics of sleep stage transitions. Evaluation on two public databases: the Montreal Archive of Sleep Studies (MASS) SS3; and the SleepEDF, which contain full night polysomnography recordings of 62 and 20 healthy subjects, respectively, demonstrates performance comparable to the state-of-the-art (Accuracy: 0.867;0.838, F1-score: 0.818;0.774 and Kappa: 0.802;0.775, on each database respectively). More importantly, the proposed network makes it possible for clinicians to comprehend and interpret the learned connectivity graphs for sleep stages.
translated by 谷歌翻译
To achieve autonomy in a priori unknown real-world scenarios, agents should be able to: i) act from high-dimensional sensory observations (e.g., images), ii) learn from past experience to adapt and improve, and iii) be capable of long horizon planning. Classical planning algorithms (e.g. PRM, RRT) are proficient at handling long-horizon planning. Deep learning based methods in turn can provide the necessary representations to address the others, by modeling statistical contingencies between observations. In this direction, we introduce a general-purpose planning algorithm called PALMER that combines classical sampling-based planning algorithms with learning-based perceptual representations. For training these perceptual representations, we combine Q-learning with contrastive representation learning to create a latent space where the distance between the embeddings of two states captures how easily an optimal policy can traverse between them. For planning with these perceptual representations, we re-purpose classical sampling-based planning algorithms to retrieve previously observed trajectory segments from a replay buffer and restitch them into approximately optimal paths that connect any given pair of start and goal states. This creates a tight feedback loop between representation learning, memory, reinforcement learning, and sampling-based planning. The end result is an experiential framework for long-horizon planning that is significantly more robust and sample efficient compared to existing methods.
translated by 谷歌翻译
Machine reading comprehension (MRC) is a long-standing topic in natural language processing (NLP). The MRC task aims to answer a question based on the given context. Recently studies focus on multi-hop MRC which is a more challenging extension of MRC, which to answer a question some disjoint pieces of information across the context are required. Due to the complexity and importance of multi-hop MRC, a large number of studies have been focused on this topic in recent years, therefore, it is necessary and worth reviewing the related literature. This study aims to investigate recent advances in the multi-hop MRC approaches based on 31 studies from 2018 to 2022. In this regard, first, the multi-hop MRC problem definition will be introduced, then 31 models will be reviewed in detail with a strong focus on their multi-hop aspects. They also will be categorized based on their main techniques. Finally, a fine-grain comprehensive comparison of the models and techniques will be presented.
translated by 谷歌翻译
Multi-hop Machine reading comprehension is a challenging task with aim of answering a question based on disjoint pieces of information across the different passages. The evaluation metrics and datasets are a vital part of multi-hop MRC because it is not possible to train and evaluate models without them, also, the proposed challenges by datasets often are an important motivation for improving the existing models. Due to increasing attention to this field, it is necessary and worth reviewing them in detail. This study aims to present a comprehensive survey on recent advances in multi-hop MRC evaluation metrics and datasets. In this regard, first, the multi-hop MRC problem definition will be presented, then the evaluation metrics based on their multi-hop aspect will be investigated. Also, 15 multi-hop datasets have been reviewed in detail from 2017 to 2022, and a comprehensive analysis has been prepared at the end. Finally, open issues in this field have been discussed.
translated by 谷歌翻译
Designing efficient and labor-saving prosthetic hands requires powerful hand gesture recognition algorithms that can achieve high accuracy with limited complexity and latency. In this context, the paper proposes a compact deep learning framework referred to as the CT-HGR, which employs a vision transformer network to conduct hand gesture recognition using highdensity sEMG (HD-sEMG) signals. The attention mechanism in the proposed model identifies similarities among different data segments with a greater capacity for parallel computations and addresses the memory limitation problems while dealing with inputs of large sequence lengths. CT-HGR can be trained from scratch without any need for transfer learning and can simultaneously extract both temporal and spatial features of HD-sEMG data. Additionally, the CT-HGR framework can perform instantaneous recognition using sEMG image spatially composed from HD-sEMG signals. A variant of the CT-HGR is also designed to incorporate microscopic neural drive information in the form of Motor Unit Spike Trains (MUSTs) extracted from HD-sEMG signals using Blind Source Separation (BSS). This variant is combined with its baseline version via a hybrid architecture to evaluate potentials of fusing macroscopic and microscopic neural drive information. The utilized HD-sEMG dataset involves 128 electrodes that collect the signals related to 65 isometric hand gestures of 20 subjects. The proposed CT-HGR framework is applied to 31.25, 62.5, 125, 250 ms window sizes of the above-mentioned dataset utilizing 32, 64, 128 electrode channels. The average accuracy over all the participants using 32 electrodes and a window size of 31.25 ms is 86.23%, which gradually increases till reaching 91.98% for 128 electrodes and a window size of 250 ms. The CT-HGR achieves accuracy of 89.13% for instantaneous recognition based on a single frame of HD-sEMG image.
translated by 谷歌翻译
Pronoun resolution is a challenging subset of an essential field in natural language processing called coreference resolution. Coreference resolution is about finding all entities in the text that refers to the same real-world entity. This paper presents a hybrid model combining multiple rulebased sieves with a machine-learning sieve for pronouns. For this purpose, seven high-precision rule-based sieves are designed for the Persian language. Then, a random forest classifier links pronouns to the previous partial clusters. The presented method demonstrates exemplary performance using pipeline design and combining the advantages of machine learning and rulebased methods. This method has solved some challenges in end-to-end models. In this paper, the authors develop a Persian coreference corpus called Mehr in the form of 400 documents. This corpus fixes some weaknesses of the previous corpora in the Persian language. Finally, the efficiency of the presented system compared to the earlier model in Persian is reported by evaluating the proposed method on the Mehr and Uppsala test sets.
translated by 谷歌翻译